Goto

Collaborating Authors

 power efficiency


Hybrid unary-binary design for multiplier-less printed Machine Learning classifiers

Armeniakos, Giorgos, Mantzakidis, Theodoros, Soudris, Dimitrios

arXiv.org Artificial Intelligence

Printed Electronics (PE) provide a flexible, cost-efficient alternative to silicon for implementing machine learning (ML) circuits, but their large feature sizes limit classifier complexity. Leveraging PE's low fabrication and NRE costs, designers can tailor hardware to specific ML models, simplifying circuit design. This work explores alternative arithmetic and proposes a hybrid unary-binary architecture that removes costly encoders and enables efficient, multiplier-less execution of MLP classifiers. We also introduce architecture-aware training to further improve area and power efficiency. Evaluation on six datasets shows average reductions of 46% in area and 39% in power, with minimal accuracy loss, surpassing other state-of-the-art MLP designs.


VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration

Helal, Shereef, Garcia-Ortiz, Alberto, Bamberg, Lennart

arXiv.org Artificial Intelligence

--Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network (DNN) accelerators--particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array architecture that virtually grows based on the present sparsity to perform larger matrix multiplications with the same number of physical multiply-accumulate (MAC) units. The proposed architecture achieves saving by 37% and 68% in area and power efficiency, respectively, at the same peak-performance, compared to a baseline systolic array architecture in a commercial 16-nm technology. Still, the proposed architecture supports acceleration for any DNN with any sparsity--even no sparsity at all. Thus, the proposed architecture is application-independent, making it viable for general-purpose AI acceleration. Over recent years, Artificial Intelligence (AI) has emerged as a revolutionary new technology, spreading across different industries and enhancing various aspects of our daily lives. The deployment of AI is not only confined to powerful data-center machines, but is increasingly demanded in resource-constrained embedded devices, a concept known as Edge AI. Deep Neural Network (DNN) architectures are the backbone of state-of-the-art AI applications to perform numerous tasks, such as image processing, speech recognition, natural language processing (NLP), and more [1]. However, DNNs have high computational demands, posing a significant challenge when deploying them in real-world applications.


A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression

Ning, Shupeng, Zhu, Hanqing, Feng, Chenghao, Gu, Jiaqi, Pan, David Z., Chen, Ray T.

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI) and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets. However, the exponential growth in computational demands has outstripped the capabilities of traditional electrical hardware accelerators. Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption. Yet, current photonic integrated circuits (PICs) designed for general matrix multiplication (GEMM) are constrained by large footprints, high costs of electro-optical (E-O) interfaces, and high control complexity, limiting their scalability. To overcome these challenges, we introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture. By applying a structured compression strategy to weight matrices, StrC-ONN significantly reduces model parameters and hardware requirements while preserving the universal representability of networks and maintaining comparable expressivity. Additionally, we propose a hardware-aware training framework to compensate for on-chip nonidealities to improve model robustness and accuracy. We experimentally demonstrate image processing and classification tasks, achieving up to a 74.91% reduction in trainable parameters while maintaining competitive accuracies. Performance analysis expects a computational density of 5.84 tera operations per second (TOPS) per mm^2 and a power efficiency of 47.94 TOPS/W, marking a 6.87-times improvement achieved through the hardware-software co-design approach. By reducing both hardware requirements and control complexity across multiple dimensions, this work explores a new pathway to push the limits of optical computing in the pursuit of high efficiency and scalability.


What's new when shopping for a laptop in 2025? 8 things to keep in mind

PCWorld

While laptops haven't exactly been advancing by leaps and bounds over the last few years, the industry has finally gotten interesting again. As we close out 2024 and head into 2025, I've got news for you if you're in the market for a new laptop: a lot has changed, and lots more changes are yet to come. Here are the new things you need to know to make an informed laptop buying decision this year. Further reading: The best laptops we've tested I used to recommend buying last year's laptop models on clearance because hardware hasn't really improved much year over year. Sure, maybe that new laptop is a bit better… but only marginally.


What's new when shopping for a laptop in 2025? 8 things to keep in mind

PCWorld

While laptops haven't exactly been advancing by leaps and bounds over the last few years, the industry has finally gotten interesting again. As we close out 2024 and head into 2025, I've got news for you if you're in the market for a new laptop: a lot has changed, and lots more changes are yet to come. Here are the new things you need to know to make an informed laptop buying decision this year. I used to recommend buying last year's laptop models on clearance because hardware hasn't really improved much year over year. Sure, maybe that new laptop is a bit better… but only marginally.


Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware

Andrei, Vlad C., Drăguţoiu, Alexandru P., Béna, Gabriel, Akl, Mahmoud, Li, Yin, Lohrmann, Matthias, Mönich, Ullrich J., Boche, Holger

arXiv.org Artificial Intelligence

This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval (MHR). By casting the MHR problem as a sparse recovery problem, we devise the currently proposed, deep-unrolling-based Structured Learned Iterative Shrinkage and Thresholding (S-LISTA) algorithm to solve it efficiently using complex-valued convolutional neural networks with complex-valued activations, which are trained using a supervised regression objective. Afterward, a novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. At the heart of this method lies the recently proposed Few Spikes (FS) conversion, which is extended by modifying the neuron model's parameters and internal dynamics to account for the inherent coupling between real and imaginary parts in complex-valued computations. Finally, the converted SNNs are mapped onto the SpiNNaker2 neuromorphic board, and a comparison in terms of estimation accuracy and power efficiency between the original CNNs deployed on an NVIDIA Jetson Xavier and the SNNs is being conducted. The measurement results show that the converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.


Dynamic Switch Layers For Unsupervised Learning

Li, Haiguang, Pervaiz, Usama, Matuszak, Michał, Kamara, Robert, Roux, Gilles, Thormundsson, Trausti, Antognini, Joseph

arXiv.org Artificial Intelligence

On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. However, power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency that often limits model complexity. The previously established Gated Compression (GC) layers offer a solution, enabling power efficiency without sacrificing model performance by selectively gating samples that lack signals of interest. However, their reliance on ground truth labels limits GC layers to supervised tasks. This work introduces the Dynamic Switch Layer (DSL), extending the benefits of GC layers to unsupervised learning scenarios, and maintaining power efficiency without the need for labeled data. The DSL builds upon the GC architecture, leveraging a dynamic pathway selection, and adapting model complexity in response to the innate structure of the data. We integrate the DSL into the SoundStream architecture and demonstrate that by routing up to 80% of samples through a lightweight pass we achieve a 12.3x reduction in the amount of computation performed and a 20.9x reduction in model size. This reduces the on-device inference latency by up to 26.5% and improves power efficiency by up to 21.4% without impacting model performance.


Advancing Neuromorphic Computing: Mixed-Signal Design Techniques Leveraging Brain Code Units and Fundamental Code Units

Isik, Murat, Miziev, Sols, Pawlak, Wiktoria, Howard, Newton

arXiv.org Artificial Intelligence

This paper introduces a groundbreaking digital neuromorphic architecture that innovatively integrates Brain Code Unit (BCU) and Fundamental Code Unit (FCU) using mixedsignal design methodologies. Leveraging open-source datasets and the latest advances in materials science, our research focuses on enhancing the computational efficiency, accuracy, and adaptability of neuromorphic systems. The core of our approach lies in harmonizing the precision and scalability of digital systems with the robustness and energy efficiency of analog processing. Through experimentation, we demonstrate the effectiveness of our system across various metrics. The BCU achieved an accuracy of 88.0% and a power efficiency of 20.0 GOP/s/W, while the FCU recorded an accuracy of 86.5% and a power efficiency of 18.5 GOP/s/W. Our mixed-signal design approach significantly improved latency and throughput, achieving a latency as low as 0.75 ms and throughput up to 213 TOP/s. These results firmly establish the potential of our architecture in neuromorphic computing, providing a solid foundation for future developments in this domain. Our study underscores the feasibility of mixedsignal neuromorphic systems and their promise in advancing the field, particularly in applications requiring high efficiency and adaptability


Qualcomm's Snapdragon 8 Gen 3 brings on-device generative AI to more Android phones

Engadget

At its annual Snapdragon Summit on Tuesday, Qualcomm revealed its latest mobile chipset. Perhaps the biggest change in the Snapdragon 8 Gen 3 is the introduction of on-device generative AI (akin to Google's Tensor G3). The chipset's AI Engine supports multi-modal generative AI models and what Qualcomm claims is the world's fastest Stable Diffusion system with the ability to generate an image in under a second. So, you should be able to whip up backgrounds and images for social media posts in a flash. Because GAI requests are handled on-device, Qualcomm says they remain private.


Neural network scoring for efficient computing

Waltsburger, Hugo, Libessart, Erwan, Ren, Chengfang, Kolar, Anthony, Guinvarc'h, Regis

arXiv.org Artificial Intelligence

Much work has been dedicated to estimating and optimizing workloads in high-performance computing (HPC) and deep learning. However, researchers have typically relied on few metrics to assess the efficiency of those techniques. Most notably, the accuracy, the loss of the prediction, and the computational time with regard to GPUs or/and CPUs characteristics. It is rare to see figures for power consumption, partly due to the difficulty of obtaining accurate power readings. In this paper, we introduce a composite score that aims to characterize the trade-off between accuracy and power consumption measured during the inference of neural networks. For this purpose, we present a new open-source tool allowing researchers to consider more metrics: granular power consumption, but also RAM/CPU/GPU utilization, as well as storage, and network input/output (I/O). To our best knowledge, it is the first fit test for neural architectures on hardware architectures. This is made possible thanks to reproducible power efficiency measurements. We applied this procedure to state-of-the-art neural network architectures on miscellaneous hardware. One of the main applications and novelties is the measurement of algorithmic power efficiency. The objective is to allow researchers to grasp their algorithms' efficiencies better. This methodology was developed to explore trade-offs between energy usage and accuracy in neural networks. It is also useful when fitting hardware for a specific task or to compare two architectures more accurately, with architecture exploration in mind.